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Abstract 

The demand for Internet services that require frequent updates through small messages, such as 
microblogging, has tremendously grown in the past few years. Although the use of such applications by 
domestic users is usually free, their access from mobile devices is subject to fees and consumes energy 
from limited batteries. If a user activates his mobile device and is in range of a service provider, a content 
update is received at the expense of monetary and energy costs. Thus, users face a tradeoff between such 
costs and their messages aging. The goal of this paper is to show how to cope with such a tradeoff, by 
devising aging control policies. An aging control policy consists of deciding, based on the current utility 
of the last message received, whether to activate the mobile device, and if so, which technology to use 
(WiFi or 3G). We present a model that yields the optimal aging control policy. Our model is based on 
a Markov Decision Process in which states correspond to message ages. Using our model, we show the 
existence of an optimal strategy in the class of threshold strategies, wherein users activate their mobile 
devices if the age of their messages surpasses a given threshold and remain inactive otherwise. We then 
consider strategic content providers (publishers) that offer bonus packages to users, so as to incent them to 
download updates of advertisement campaigns. We provide simple algorithms for publishers to determine 
optimal bonus levels, leveraging the fact that users adopt their optimal aging control strategies. The 
accuracy of our model is validated against traces from the UMass DieselNet bus network. 



1 Introduction 

The demand for Internet services that require frequent updates through small messages has tremendously 
grown in the past few years. While the popularity of traditional applications of that kind, such as weather 
forecasts, traffic reports and news, is unlike to decline, novel applications, such as Twitter [1], have arisen. 
Twitter, alone, recorded a 1,500% increase in the number of registered users since 2006, and currently counts 
with more than 100 million users worldwide. For a second example, mobile messaging with Exchange Ac- 
tiveSync [5] allows smartphone users to receive timely updates when new data items arrive in their mailboxes. 
Today, Exchange ActiveSync is supported by more than 300 million mobile devices worldwide. Henceforth, 
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for the sake of concreteness we focus on microblogging applications, such as Twitter or similar news feeds. 

Users of microblogging applications join interest groups and aim at receiving small messages from editors. 
As messages age, they get outdated and their utilities decrease. As a consequence, users must control when 
to receive updates. A user willing to receive an update activates his mobile device, which then broadcasts 
periodic beacons to inform demands to service providers. 

Although the use of microblogging applications by domestic users is usually free, their access from mobile 
devices consumes energy from limited batteries and is subject to fees. We consider users that can access the 
Internet either through a WiFi or 3G network. The 3G network provides broader coverage, but its usage 
requires a subscription to a cell phone data plan. More generally, our results apply to any scenario in which 
users can access the network through multiple interfaces with different costs and ubiquitousness [3] [4] . 

Let the age of a message held by a user be the duration of the interval of time since the message was 
downloaded by such user. If a user activates his mobile device and is in the range of a service provider (WiFi 
access point or 3G antenna), an update is received and the age of the message held by the user is reset to one, 
at the expense of the previously mentioned monetary and energy costs. Thus, users face a tradeoff between 
energy and monetary costs and their messages aging. To cope with such a tradeoff, users decide, based on the 
age of the stored message, whether to activate the mobile device, and if so, which technology to use (WiFi 
or 3G). We refer to a policy which determines activation decisions as a function of message ages as an aging 
control policy. The first goal of this paper is to devise efficient aging control policies. 

Strategic content providers can incent users to download updates of advertisement campaigns or unpopular 
content by offering bonus packages. The goal of the bonus package, translated in terms of our aging control 
problem, consists of minimizing the average age of content held by users, subject to a budget on the number 
of messages transmitted per time slot, as dictated by the service provider capacity. Although nowadays 
bonus packages are set exclusively by service providers [5], we envision that in the future content providers 
will reach agreements with service providers. Through such agreements, content providers, also known as 
publishers, will play an important role in the settlement of bonus packages [5]. The second goal of this paper 
is to solve the publishers ' bonus selection problem, when users adopt an aging control policy. 

We pose the two following questions, 

1. what is the users optimal aging control policy? 

2. leveraging the users optimal aging control policy, what is the publishers optimal bonus strategy? 

We propose a model that allows us to answer the questions above. Our model accounts for energy costs, 
prices and the utility of messages as a function of their age. Using our model, we show that users can maximize 
their utilities by adopting a simple threshold policy. The policy consists of activating the mobile device if the 
content age surpasses a given threshold and remaining inactive otherwise. We derive properties of the optimal 
threshold, and a closed-form expression for the average reward obtained by users as a function of the selected 
strategies. We then show the accuracy of our approach using traces collected from the UMass DieselNet 



2 



bus network. Using traces, we also study location-aware policies, according to which users can activate their 
devices based on their position on campus, and compare them against location-oblivious policies. 

For the strategic publishers, we present two simple algorithms to solve the bonus determination problem 
posed above. The first algorithm presumes complete information while the second consists of a learning 
algorithm for publishers that have imperfect information about the system parameters, and is validated 
using trace-driven simulations. Finally, we show the convergence of the proposed learning algorithm, making 
use of results on differential inclusions and stochastic approximations. 

In summary, we make the following contributions. 

Model formulation: We introduce the aging control problem, and propose a model to solve it. Using 
the model, we derive properties about the optimal aging control policy and closed-form expressions for the 
expected average reward. 

DieselNet trace analysis: We quantify how aging control policies impact users of the DieselNet bus 
network. Using traces collected from DieselNet, we show the accuracy of our model estimates and analyze 
policies that are out of the scope of our model. 

Mechanism design: We provide two simple algorithms for publishers to incent users to download 
advertisement updates, leveraging the fact that users adopt their optimal aging control strategies. We 
formally show the convergence of the proposed learning algorithm, and numerically investigate its accuracy 
and convergence rate using trace-driven simulations. 

The remainder of this paper is organized as follows. After further discussing the need for aging control, 
in we present our model, and in 2] we report results obtained with traces from the UMass DieselNet bus 
network. Our model analysis is shown in [J5j followed by the solution of the publishers problem in Sj6] We 
present a discussion of our modeling assumptions in related work in |J5]and ^concludes the paper. 

2 Why Aging Control? 

The goal of an aging control policy is to provide high quality of service while 1) reducing energy consumption 
and 2) reducing 3G usage, by leveraging WiFi connectivity when available. Whereas the reduction in energy 
consumption is of interest mainly to subscribers, the reduction in 3G usage is of interest both to service 
providers and subscribers. Next, we discuss a few issues related to the adoption of aging control from the 
service providers and subscribers standpoints. 

2.1 Service Provider Standpoint: Limited Spectrum 

The increasing demand for mobile Internet access is creating pressure on the service providers, whose limited 
spectrum might not be sufficient to cope with the demand [3] . To deal with such pressure, some wireless 
providers are offering incentives to subscribers to reduce their 3G usage by switching to WiFi [8] . Therefore, 
it is to the best interest of the service providers to devise efficient aging control policies for their users. Aging 
control policies can not only reduce the pressure on the 3G spectrum, but also reduce the costs to service 
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WiFi 


10%-20% 


0.63J (Android Gl) 1.18J (Nokia N95) 


free 


3G 


80%-90% 


w 


0.20 USD (SMS in USA) 



Table 1: Coverage, energy consumption and monetary costs 



providers that support hybrid 3G/WiFi networks (with savings of up to 60% against providers that offer only 
3G 0). 

2.2 Users Standpoint: Energy Consumption, Monetary Costs and Coverage 

How do the coverage, energy consumption and monetary costs vary between 3G and WiFi? TableJTJillustrates 
the answer with data obtained from [3J |H1 [TOJ [IT] and is further discussed next. 

Energy consumption and AP scanning The energy efficiency of WiFi and 3G radios differs significantly. 
WiFi users actively scan for APs by broadcasting probes and waiting for beacons sent by the APs. If multiple 
beacons are received, users connect to the AP with highest signal strength. The association overhead, incurred 
when searching and connecting to the APs, yields substantial energy costs. Niranjan et al. [9] report that the 
energy consumption of a WiFi AP scan is 0.63J in Android Gl and 1.18J in Nokia N95, taking roughly 1 and 
2 seconds, respectively (note that in one time slot users might perform multiple AP scans). In the remainder 
of this work, to simplify presentation, we consider only the energy costs related to association overhead. In 
many scenarios of practical interest WiFi and 3G radios incur comparable energy costs after the handshaking 
phase, and our model can be easily adapted to account for cases in which such energy costs differ. 

Monetary costs Whereas free WiFi hotspots are gaining popularity [12j[T3], the use of 3G is still associated 
to monetary costs. For example, in the United States and in Australia, it typically costs 0.20 USD to send 
an SMS through a 3G network [TU]. As a second example, in Brazil subscribers of Claro 3G incur a cost of 
USD 0.10 per megabyte after exceeding their monthly quota |14) . 

Coverage It is well known that the coverage of 3G is much broader than the coverage of WiFi. This is 
because the 3G towers are placed by operators so as to achieve almost perfect coverage, while WiFi access 
points are distributed and activated in an ad hoc fashion. For instance, it has been reported in [3] that 3G 
and WiFi are available roughly 12% and 90% of the time in the town of Amherst, Massachusetts. 

Approximations In light of the above observations, in the rest of this paper, except otherwise stated, we 
consider the following three approximations concerning energy consumption, monetary costs, and coverage. 
We assume that 1) the energy consumption of scanning for WiFi access points dominates the energy costs of 
the 3G and WiFi radios, 2) WiFi access points are open and freely available, whereas the use of 3G incurs a 
monetary cost per message and 3) WiFi is intermittently available whereas 3G offers perfect coverage. Note 
that the above three approximations are made solely to simplify presentation: our model has flexibility to 



4 



account for different energy consumption and monetary costs of WiFi and 3G, and can be easily adapted to 
account for a 3G network that does not have perfect coverage. 

2.3 Adopting Aging Control 

Next, we discuss two key aspects pertaining the adoption of aging control: 1) the delay tolerance of the 
applications and 2) the availability of WiFi access points. 

2.3.1 Delay Tolerant Applications: Age and Utility 

Aging control is useful for applications that can tolerate delays, such as news feeds and email. For these 
applications, users might be willing to tolerate some delay if that translates into reduced energy consumption 
or monetary costs. 

Workload The workload subject to aging control can be due to one or multiple applications. 

One application: Websites such as Yahoo! provide news feeds on different topics, ranging from business 
and economics to entertainment. Niranjan et al. 9 monitored feeds in ten categories, and reported that in 
three of these categories (business, top stories and opinion/editorial) at least one new feed is available every 
minute, with very high probability [51 Figure 13]. 

Mutiple applications: Aiming at energy savings, it is common practice for users of smartphones to 
synchronize the updates of multiple applications, using APIs such as OVI Notifications [15]. In this case, the 
larger the number of applications that require updates, the higher the chances of at least one update being 
available every time slot. 

In the remainder of this paper, we focus on the above mentioned workloads, for which updates are issued 
with high frequency. The analysis of light and medium workloads is subject for future work (see Sj7]) . 

User Interface How does the utility of a content degrade with its age? The answer to this question is 
user and application dependent. It turns out that some applications, such as JuiceDefender |16j . already 
allow iPhone users to specify their delay tolerance per application so as to save battery. For instance, users 
can simply specify a delay threshold, after which they wish to have received an update. In our model, this 
corresponds to an utility that has the shape of a step function (see i j5.3l) . 

2.3.2 WiFi Access Points 

Open WiFi access points are widespread [HJ[T2]. The age control mechanisms described in this paper can rely 
on such access points, which users encounter in an ad hoc fashion. Alternatively, Internet service providers 
can deploy their own closed networks of WiFi access points, in order to alleviate the load of their 3G networks. 
In France, enterprises such as SFR and Neuf are already adopting such strategy. The age control mechanisms 
presented in this paper can be applied in this setting as well. In this case, users are required to perform 
a web-based login in order to access the WiFi spots, which can be done automatically using tools such as 
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Freewifi [T7] . Note that even if mobile users have perfect geographic information about the location of the 
access points, random factors like attenuation (fading) and user speed will determine the availability of the 
access points, which will vary temporally and spatially. 

Markov Decision Processes Remaining inactive at a given instant of time is particularly useful if future 
transmission opportunities are available in upcoming time slots. However, as discussed above, WiFi access 
points are intermittently available. Therefore, WiFi users experience a random delay between the instant 
of time at which they activate their devices and the instant at which they obtain content updates. How to 
account for the unpredictability of transmission opportunities in upcoming slots? To answer this question, 
in the next section we propose a model, based on a Markov Decision Process, which naturally accounts for 
the effect of the actions at a given time slot on the future states of the system, and allows us to derive the 
structure of the optimal aging control policy. 

3 Model 

We consider mobile users that subscribe to receive content updates. In the microblogging jargon, such users 
are said to follow a content. Content is transmitted from publishers to users, through messages sent by the 
service providers. The age of the message held by a user is defined as the length of time, measured in time 
slots, since the message was downloaded. Note that we assume that updates are available at every time slot 
with high probability (see SJ7]). 

Let Xt be the age of the message held by a user at time t. The age of the message equals one when the 
user first receives it, and increases by one every time slot, except when the user obtains an update, time at 
which the age is reset to one. 

A user can receive message updates when his mobile device is in range of a service provider and the 
contact between them lasts for a minimum amount time which characterizes a useful contact opportunity. 
While the 3G technology is assumed to guarantee perfect coverage, WiFi users are subject to outages. Let 
et be an indicator random variable equal to 1 if there is a useful contact opportunity with a WiFi provider 
at time t, and otherwise. We let p = E[et] and assume < p < 1. Next, we state our key modeling 
assumption. 

Assumption 3.1. Uniform and independent contact opportunity distribution: The probability of 
a useful contact opportunity between a user and WiFi providers is constant and independent across time slots, 
and equals p. 

Under Assumption 13.11 there are no correlations in time between contact opportunities experienced by 
a user; this is a strong assumption, since such correlations are present in any mobile network, as illustrated 
in §4.11 However, as shown in §4.2[ there are scenarios of practical interest in which the uniformity and 
independence assumption does not compromise the accuracy of the results obtained using our model. There- 
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fore, we proceed our analysis under such an assumption, indicating its implications throughout the paper 
and studying scenarios that are out of the scope of our model using real-world traces. 

The goal of each user is to minimize the expected age of the content he follows, accounting for energy 
and monetary costs. In order to achieve such goal, users must choose at each time slot t their actions, a t . 
The available actions are 

0, inactive 
»i = \ 1, WiFi 

2, WiFi if 3 useful contact opportunity, else 3G 

State Dynamics The age of the content followed by a user increases by one if his device is inactive or if it 
is not in range of a service provider, and is reset to one otherwise. Let M be the maximum age of a content. 
Then, 

min(xt + 1,M), if (at=0) or (a t =l and et=0) 
x t +i= { (1) 
1, if (af=2) or (at=l and et=l) 

Utility Let U(xt) the utility of the followed content at time t. We assume that U(x) is a non-increasing 
function of x, which corresponds to messages that become obsolete with time, and that U(x) — Z if x > M. 

Costs Let G be the cost incurred to maintain the mobile device active, measured in monetary units. Then, 
the energy cost Ct is given as a function of at as 



c t (a t ) 



G, if a t > 1 
0, if a t = 



Service providers charge a price for each message transmitted. The prices charged by WiFi and 3G 
providers are P and J3G, respectively. When a user receives an update, he is subject to a monetary cost of 

. v j P3G, if at = 2 and e t = 
m t {at,e t )=< (3) 
I P, if a t > 1 and et = 1 

Content providers, also referred to as publishers, can offer bonuses to users that follow their contents. 
Such bonuses are set in agreement with service providers, and are transfered to users as credits. Let B t be 
the bonus level set by the content provider, B t < min(P3G, P). 

The instantaneous user reward at time f, r t (xt, a*), is 

rt{x t ,at)=U(xt)-Ct(at)- max(m t -_B t , 0) (4) 

Users Strategies The strategy of a user is given by the probability of choosing a given action a t at each of 
the possible states. Without loss of generality, in this paper we restrict to Markovian stationary policies [TH 
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Haigis Mall Haigis Mall Haigis Mall Haigis Mall Haigis Mall Haigis Mall 




(a) Bus-AP contacts during a typical bus shift (b) Route of a bus that passes through Haigis Mall 

Figure 1: A typical bus shift. 
Chapter 8]. The probability of choosing action at at state xt is denoted by u{at\xt). 

Problem Definition The problem faced by each user consists of finding the strategy u that maximizes his 
expected average reward. 

User Problem: Obtain strategy u so as to maximize E[r;u], where 

I e 

E[r;u]= lim - V" E[r t (x t , a t ); u] (5) 

l— <rCO t i ' 

t=0 

In what follows, we drop the subscript t from variables when analyzing the system in steady state. 

Optimal Threshold Policy Next, we introduce two special classes of policies, the two-threshold policies 
and the threshold policies. Let s and s^g be the WiFi and the 3G thresholds, 1 < s < s^g < M+l. A policy 
which consists of setting a — when x < s, a = 1 when s < x < s 3 g, and a = 2 if x > s^Gi is referred to 
as a two-threshold policy. If users have no access to 3G (P3G ~> 00) , a policy which consists of backing off if 
x < s and being active if x > s is referred to as a threshold policy. 

Note that while x € the thresholds assume values in the range [1,M + 1]. When P$g — > 00, 

s = M + 1 means that the user should remain always inactive (refer to Table [5] for notation) . 

The following proposition reduces the problem of finding the optimal policy for the User Problem to 
the one of finding the two thresholds s and s^g- 

Proposition 3.1. The User Problem admits an optimal policy of two-threshold type such that 
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if P 3G >G/p + P, 



if Pzg<G/ p + P, 




0, if x < s 



2, otherwise 



1 



if s < x < s 3G 



a(x) 



0, ifx<s 3G 



2, otherwise 



If P3G 00, s 3G = M + 1 and the User Problem admits an optimal policy of threshold type. 

When the price charged by the 3G providers dominates the costs, the optimal policy consists of activating 
the WiFi radio if s < x < s 3 g and using the 3G only if x > s 3 q. If the price charge by the 3G providers 
is dominated by G/p + P. though, users are better off relying on the 3G technology, which offers perfect 
coverage. 

To simplify presentation, in the upcoming sections we assume that 1) users have access only to WiFi 
APs (P 3 g 00) and 2) the optimal threshold policy is unique. In Appendix [Cl we 1) show how our results 
extend to the scenario in which users can choose between WiFi and 3G and 2) characterize the scenarios 
under which the optimal threshold is not unique. 

Definition 3.1. The threshold s* of the optimal policy is 



The properties of the optimal threshold are discussed in the following section in light of traces from 
DieselNet, and are formally stated in $5] 

4 Evaluating Aging Control Policies in DieselNet 

In this section we use traces collected from the UMass Amherst DieselNet [TH] to evaluate how aging control 
policies perform in practice. Our goals are 1) to show the accuracy of our model predictions, 2) to assess 
the optimality of the model predictions in the class of threshold policies and and 3) to compare the optimal 
policy obtained with our model against policies that are out of the scope of our model, such as location-aware 
policies. 

The users of the microblogging application considered in this section are passengers and drivers of buses. 
We assume that users are interested in following one of the heavy load news feeds described in tj2.3.I[ for 
which updates are issued every minute with high probability. Time is divided into slots of 5 minutes. 

We begin with an overview of statistics collected from the traces and their implications on aging control. 

4.1 Measuring Contact Statistics 

To characterize the update opportunities experienced by the users, we analyze contacts between buses and 
access points (APs) at the UMass campus. The traces were compiled during Fall 2007 from buses running 



s* = argmax s {E[r; s}} 



(6) 
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Figure 2: Useful contact opportunity statistics (a) time between contacts to AP at Haigis Mall; (b) CDF of 
p; (c) scatter plot of opportunities in consecutive slots. 

routes serviced by UMass Transit. Each bus scans for connection with APs on the road, and when found, 
connects to the AP and records the duration of the connection [TP] . 

Figure Q^a) illustrates a sample of the trace data. In Figure QJa), each cross corresponds to a contact 
between the bus and an AP. The x and y coordinates of each cross correspond to the time of the day at 
which the contact occurred and the duration of the contact, respectively. 

In the rest of this paper, we assume that when the mobile devices are active, scans for access points occur 
every 20 seconds, and last up to the moment at which an access point is found. Ra et al. [Ill §5] empirically 
determined that a scanning frequency of 1/20 seconds yields a good balance between efficiency and low energy 
expenditure. An access point is considered useful once it is scanned in two consecutive intervals of 20 seconds. 
Henceforth, we refer to contact opportunities that last at least 20 seconds as useful contact opportunities, or 
contacts for short, when the qualification is clear from the context. Time slots in which at least one useful 
contact opportunity begins are referred to as useful slots. 

Figure (Ub) shows the map of one of the bus routes considered in this paper, with the Haigis Mall in 
evidence. The Haigis Mall is a central location for the transit of the buses at UMass, and in this work we 
restrict to routes that pass through it. In Figure QJa), vertical lines correspond to instants at which the bus 
passed through the Haigis Mall. A bus run corresponds to the interval between two arrivals at Haigis Mall. 
A bus shift corresponds to a sequence of consecutive and uninterrupted bus runs, in the same day. Note that 
a bus run in Figure [TJa) takes around 40 minutes. A typical bus run varies between 40 minutes and 1 hour 
and 20 minutes (see Figure EJa)). 

Figure UJa) shows that roughly every time the bus passes through Haigis Mall there is a useful contact 
opportunity. This observation has important implications on the aging control policy. In particular, it 
indicates that users that are location-aware can take advantage of such information in order to devise their 
activation strategies. We will evaluate the performance of the policy which consists of activating the mobile 
device only at Haigis Mall in iJ4.2.4l 

Next, our goal is to study the distribution of contact opportunities between buses and APs. In particular, 
we wish to identify the extent at which the uniformity and independence assumption ( Assumption 13 . 11) holds 
in practice, specially when buses are far away from Haigis Mall (see Figure [lj. To this aim, for each day 
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and for each bus shift, we generate out of our traces a string of ones and zeros, corresponding to useful and 
unuseful slots, respectively. Such strings are used to plot Figures [^b) and [He) , as well as an input to our 
trace-driven simulator, described in the next section. 

Figure [U[b) shows the CDF of the bus-AP contact probability (p) across bus shifts. The median of p is 
0.53. The contact probability is above 0.3 for up to 85% of the bus shifts and below 0.7 for more than 90% 
of the bus shifts. 

Figure EJc) shows a scatter plot of the contact probabilities in two consecutive slots. Each cross corre- 
sponds to a bus shift. A cross with coordinates x and y corresponds to a bus shift in which the probability of 
no contact in slot t after no contact in slot t — 1 is x and the probability of no contact in slot t after a contact 
in slot t — 1 is y. The figure indicates that when x varies between 0.3 and 0.7, a significant fraction of points 
is close to the line x = y. This behavior is similar to the one expected in case contacts are approximately 
uniform and independent from each other. 

4.2 Evaluating Aging Control Policies 

In this section we validate our model against traces. We begin by describing our methodology and reference 
configuration, and then consider both location-oblivious and location-aware policies. 

4.2.1 Methodology and Reference Configuration 

To validate our model we use the traces described in the previous section. We assume that users do not have 
a cell phone data plan (P3G 00) and WiFi is free (P = B = 0). 

The computation of the optimal policy using the proposed model requires estimates of p, U (x) and G. 
For a given bus shift, p is estimated as the number of useful slots in that shift divided by the total number of 
slots. Note that to compute the optimal strategy using our model we assume knowledge of p, but not of the 
distribution of contact times and durations. When searching for the optimal threshold strategy using traces, 
in contrast, we perform trace-driven simulations. Our simulator, as well as extensive statistics obtained from 



the traces, are available at |http : //www- net . cs . umass . edu/~ sadoc/ agecontrol/ For each bus shift, our 



simulator takes as input the string of ones and zeros corresponding to useful and unuseful slots, and computes 
the reward experienced by a user adopting a given activation policy. The strategy that yields the highest 
reward correspond to the optimal trace- driven policy. 

In our reference setting, the utility of messages decays linearly during a bus run, and remains zero 
afterwards, U(x) = max(M — x, 0). We vary M between 10 and 16 (which correspond to 50 and 80 minutes, 
resp., see also Figure a)), in increments of 2. For ease of presentation, let b be the energy cost scaled by a 
factor of 1/(M-1), 

b = G/(M - 1) (7) 

We vary b according to our experimental goals between 0.2, 0.8 and 1.8, corresponding to small, medium and 
high costs, respectively. 
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Figure 3: Rewards as predicted by our model and observed in DieselNet. 



4.2.2 Location Oblivious Policies 



Next, we validate our model against traces assuming that users are restricted to threshold policies (non- 
threshold policies are considered in the next section). Figure [3] shows, for different model parameters, the 
averages of 1) the optimal reward predicted by our model, 2) the reward effectively obtained using the 
optimal strategy computed by the model as input to our trace-driven simulator and 3) the optimal trace- 
driven (threshold) policy, with the 95% confidence intervals (to increase the number of samples, each bus 
shift is replayed 40 times). 

In Figure [31 note that across all parameters, the rewards predicted by our model match the rewards 
effectively obtained using the policy proposed by our model pretty well. When the energy cost is low 
(b = 0.2), our model predictions also closely match the optimal trace-driven policy. When the energy cost is 
medium (b — 0.8) the accuracy of the predictions of our model depends on the maximum age M. Recall that 
typical bus runs last between 40 minutes and 1 hour and 20 minutes (see Figure HJa)) and that when a bus 
run is completed, a contact occurs with high probability (see Figure Q]( a)). If the maximum age is 50 minutes 
(M = 10), the fact that our model does not capture the correlations among contacts between buses and APs 
does not play an important role. However, as M increases, strategically setting the policy to account for such 
correlations is relevant. Similar reasoning holds when b = 1.8. 

Figure Eta) shows the distribution of the distance between the optimal threshold using our model versus 
the optimal trace-driven policy, for different model parameters. In accordance to our previous observations, 
when b = 0.2, the distance between the optimal threshold and the one computed by our model is smaller 
than or equal to two in at least 60% of the bus shifts. The same holds when b = 0.8 and M = 10. When 
M = 16 and b = 1.8, in contrast, the distance to the optimal threshold is smaller than two for less than 40% 
of the bus shifts. 

Figure EJb) reports the mode of the optimal threshold as estimated by our trace-driven simulator and by 
our model. The mode of the optimal threshold increases with the energy cost (see Proposition [O]), and the 
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Figure 4: Model validation (a) distance between optimal trace-driven threshold and optimal model threshold; 
(b) optimal threshold mode; (c) trace-driven reward as a function of the threshold, where threshold is assumed 
the same at all bus shifts. 



distance between the simulator and model predictions never surpasses two. 

Figure Ufc) provides further insight on the problem when b = 1.8. Until this point, we allowed different 
bus shifts to correspond to different threshold policies (i.e., we considered bus shift discriminated strategies). 
Next, we consider users that adopt the same threshold policy over all bus shifts (i.e., we consider flat strategies 
over bus shifts). For different threshold values, Figure|4Kc) shows the average reward obtained using our trace- 
driven simulations (see Figure [3] for confidence intervals). Note that the optimal threshold equals eight in the 
four scenarios under consideration. This threshold value, in turn, is in agreement with the mode predicted 
by our model (see Figure E2b)). In Figure 03a) we reproduce the results of Figure [4jc) using our model. 
We let p = 0.53, the median contact probability (see Figure Etc)). Comparing Figure SJc) and FigureEta), 
we observe that the empirical curves are predicted by our model with remarkable accuracy. The distance 
between the optimal threshold predicted by our model (marked with circles in Figure UKa)) and the one 
obtained through the trace-driven simulations (consistently equal to eight) is smaller than or equal to 1, and 
the utility discrepancy does not surpass 0.5. 



4.2.3 Location Aware Policy 

Users can leverage geographic information in order to decide when to activate their mobile devices. At 
UMass, for instance, a central bus stop is located at the Haigis Mall (see Figure [IJb)). Buses usually linger 
at Haigis Mall for a couple of minutes, time at which transmission opportunities usually arise (see FigureQJa)). 
Therefore, in this section we consider the following location aware policy: activate the mobile device if in 
Haigis Mall, and remain inactive otherwise. Figure [3] shows the performance of our location aware policy 
obtained using trace-driven simulations. When the energy costs are not high (b — 0.2 or 0.8), it is always 
advantageous to activate the mobile device before the bus returns to Haigis Mall, in order to opportunistically 
take advantage of contacts with APs during the bus run. However, when the energy costs are high (b = 1.8), 
our location aware policy outperforms the best trace-driven threshold policy. The non-optimality of threshold 
policies in this scenario corroborates the fact that when energy costs are high, it is important to account for 
correlations among contact opportunities. 
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variable 


description 


at 


action of user 


x t 


age of message 


n 


reward 


e* 


=1 in case of useful contact opportunity, =0 otherwise 


parameter 


description determined by 


U{x) 


utility of message at age x user 


B t 


bonus level publisher 


P 3G 


price of 3G 3G service provider 


P 


price of WiFi WiFi service provider 


G 


activation cost 


P 


probability of useful contact opportunity, p — E[et] 



Table 2: Table of notation. Notes: (1) Subscripts are dropped when in steady state. (2) B is a parameter 
in Sj5] and a variable in $6] 

4.2.4 Summary 

To sum up, our model accurately predicts the reward effectively obtained by its proposed policy. If the energy 
costs are low, or if users do not discriminate their strategies with respect to the bus shifts, our model can 
also predict the optimal policy obtained through trace-driven simulations. Otherwise, users can benefit from 
correlations among contact opportunities. 

5 Model Analysis 

Our goals now are (a) to derive the optimality conditions that must be satisfied by the optimal policy; (b) 
to show properties of the optimal threshold and (c) to present specialized results for step and linear utility 
functions. We tackle each of the goals in one of the subsequent sections, respectively. 

5.1 Optimal Policy General Structure 

We now derive the general structure of the optimal policy. To this goal, consider a fixed policy u. In this 
section we assume that, under u, users have a positive activation probability in at least one state. The 
conditions for the optimality of the policy which consists of remaining always inactive will be established in 
Proposition 15.11 

Let V u denote the transition probability matrix of the Markov chain {xt : t = 1,2,...} which characterizes 
the dynamics of the age, given policy u. Let r u {x) be the expected instantaneous reward received in a time 
slot when the system is in state x and policy u is used. The vector of expected instantaneous rewards is 
denoted by r u . 

Let the gain, g u , be the average reward per time slot in steady state, g u = E[r;u\. Since the number of 
states in the system is finite and from each state there is a positive probability of returning to state 1, V u 
comprises a single connected component, and g u does not depend on the initial system state. 

The relative reward of state x at time t, V(x,t), is the difference between the expected total reward 
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accumulated when the system starts in x and the expected total reward accumulated when the system starts 
in steady state. Let V(x) = lim^oo V(x, t) — lim^oo ( ^2t=o ^u( r « — 5« e )) ( x )i where e is a column vector 
with all its elements equal to one. It follows from (Q} and similar arguments as to those in [T51 eq. (8.4.2)] 
that a policy which satisfies the following conditions, for 1 < x < M, is optimal, 

V{x) = max (ll(x) + V{min(x + 1, M)) - g u , U(x)-G+p(V(l)-P+B) + (l - p)V{mm(x + 1, M))-g„) 

An optimal policy is obtained from the optimality conditions as follows, 

f 0, ii-G/p+(V(l)-P+B)<V(min(x + l,M)) 
a(x)= < (8) 
I 1 , otherwise 

In Appendix |A1 we show that V{x) is decreasing on x when P = (the corresponding result when P > 
can be found in Appendix IA.3|) . Thus, the existence of an optimal policy in the class of threshold policies 
(Proposition 13. 1[) follows from ©. 

Note that adding a constant to U(x), 1 < x < M, does not affect the optimal policy (|8]). Therefore, in 
the rest of this paper we assume, without loss of generality, that U(M) = 0. 

5.2 Optimal Threshold Properties 

In this section we aim at finding properties of the optimal threshold. To this goal, we note that an user 
adopting a threshold strategy goes through cycles. Each cycle consists of an idle and active period. An idle 
period is initiated when the age is one, and ends immediately before the instant at which the age reaches the 
threshold s. At age s, an active period begins and lasts on average 1/p time slots, up to the instant at which 
the age is reset to one. 

The following proposition establishes conditions according to which the optimal actions are invariant in 
time (see Appendix Appendix [AJ. The proof of the subsequent results are available in the appendices. 

Proposition 5.1. The optimal policy consists of being always active if and only if 

(9) 



/ M-l \ 

tf(l)-p V U(x){l-pT- l \ >~+p-b 

l -p\ p 



The optimal policy consists of being always inactive if and only if 

M-l 

J2u(M-j) < -+P-B (10) 

Given that the condition for the optimal policy to be always inactive is established in the second part of 
Proposition 15. 1[ in what follows we focus on policies which consist of being active in at least one state. 

For a fixed threshold policy with corresponding threshold s, let the system state transition probability 
matrix, V s , be 
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Figure 5: Model numerical evaluation (U(x) — M — x, P = 0, p = 0.54). 



M LP 



1 
1 



I-33 



l-p 



(11) 



Let 7r be the steady state solution of the system, tvV s = 7r. Then, the fraction of time slots in which an 
user issues updates is 7i"i (see Appendix IA.1[) . 

1 ^ 



7Tl 



P 



(12) 



Next, we derive a closed-form expression for the expected reward as a function of the threshold s. 
In Appendix IA.2I we show that replacing the expression of the steady state solution ir into E[r; s] — 
Y,iLi Kir(i, li> a ) yields, 



E[r; s]=m 



M-l-s 



G 



Y^u{x)+ J2 u(i + s)(i- P y p+b 

x=l i=0 " 



The derivation of the equation above is found in Appendix IA.21 In Appendix IA.6I we use the expression 
above to show that the expected average reward is non-decreasing in s, for s < s* , as stated in the following 
proposition. 



Proposition 5.2. The optimal threshold value s* is 



s* = min js E[r; s] > E[r, s + 1] j 



(13) 



Finally, Proposition 15.31 formalizes the monotonicity of s* with respect to G observed in 
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Proposition 5.3. The optimal threshold s* is non- decreasing with respect to G and P and non-increasing 
with respect to B. 

5.3 Special Utility Functions 

Next, we specialize our results to two classes of utilities. 

5.3.1 Step Utility 

Let the utility function U(x) be given by a step function, U(x) = v if x < k and U(x) — otherwise. The 
next proposition allows us to efficiently compute the optimal threshold in this case. Let ip be the root of 
■j^E[r; s] = when k < s. The closed form expression of <p is given in Appendix IA.8I 

Proposition 5.4. For the step function utility, the optimal threshold s* is equal to either 1, k—1, \}f], If], 
M or M + 1 . 

5.3.2 Linear Utility 

Let U(x) = M — x. Then, E[r; s] can be expressed in closed form as a function of s, M, p and G (see Ap- 
pcndix lA.91) . To illustrate the behavior of E[r;s], we let p — 0.54 (see Figure [5]), P = B = 0, and vary b 
and M as shown in the legend of Figure O Figure OJa) was discussed in §4.2.21 in light of our trace-driven 
simulations. Figure^b) shows that when b = 0.09 (resp., b — 3.18), inequality (|9]) (resp., (jTUJ) ) holds and 
the optimal threshold is 1 (resp., 13), in agreement to Proposition 15.11 In accordance to Proposition I5.2[ 
Figure [5jt>) also indicates that when s < s* the reward is increasing. Finally, the optimal threshold in 
Figure [D(b) increase as a function of b = G/(M — 1), which serves to illustrate Proposition 15.31 

6 The Publisher Bonus Package 

In this section we consider strategic publishers that offer bonus packages to users, so as to incent them to 
download updates of advertisement campaigns or unpopular content. In §6. II we assume that publishers have 
complete information on the system parameters and consider the incomplete information case in £ 16.21 

6.1 Complete Information 

Next, we consider publishers that, while devising their optimal bonus strategies, leverage the fact that users 
solve the User Problem. The optimal bonus strategy consists of finding the bonus level B that minimizes 
the average age of messages in the network, under the constraint that the expected number of messages 
transmitted per time slot is below a given budget, dictated by the service provider. 

Let N be the number of users in the network. In what follows, we make the dependence of s* and ~K\ 
(see (HU)) on the bonus level B explicit. Let Q be the average number of messages transmitted per time slot. 

Q = Nir 1 {B)=N/(s*(B) + {l-p)/p) (14) 
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Figure 6: Learning algorithm: sample path of trace-driven simulation 



Let A be the average age of messages in the network (see Appendix I A . 1 01 for its closed- form expression). Let 
T be the constraint imposed by the service provider on the expected number of messages sent per time slot. 
Then, the problem faced by the publisher is 



Publisher Problem: Obtain bonus level B so as to 




M 




min A = iiri(B) 

i=l 


(15) 


s.t. Q <T 


(16) 



Note that since s*(B) is not injective, there might be a range of bonuses levels that solves the Publisher Problem. 
Under the assumption that the above problem admits a solution, which is guaranteed to exist if s(P) > 
N/T-(l-p)/p, 

Proposition 6.1. The solution of the Publisher Problem consists of setting the bonus level B in such 
a way that s = max(0,min(Af + 1, \N/T — (1 — p)/p\)). The solution can be found using a binary search 
algorithm. 



6.2 Incomplete Information: Online Learning Algorithm 

How publishers can set their bonus levels without knowing the number of users in the system and their 
strategies? To answer this question, we present a simple learning algorithm to solve the Publisher Problem 
when the system parameters are unknown. 



Algorithm 1 Online estimation of optimal bonus level. 

1: Input: maximum bonus level B, target number of messages per slot T, round duration r time slots, learning 
rate a 

2: Choose initial bonus level Bo such that Bq € [0, B\; t <— 

3: while \T — Q t \ > e do 
4: At the end of round t, 
5: Qt^Rt/r 

6: B t +i <- min(B, max(0, (B t +a(T- Q t ) /t))) 
7: t <- t + 1 

8: end while 



18 



The proposed algorithm proceeds in rounds. Each round corresponds to r time slots, at which users have 
their requests served. Let B t be the bonus set by the publisher at the beginning of round t, and let R t be 
the number of requests served at that round. The average number of requests served per time slot in round 
t is Q t = R t /r. 

Algorithm [1] updates the bonus level as follows. At round t, if the average number of requests served 
per time slot is above the target T, the bonus level is decreased by a(Qt —T)/t; otherwise, the bonus level 
is increased by a(T — Qt)/t. The learning step size, a, is a learning parameter that impacts the algorithm 
convergence time. Smaller values of a yield a smoother but slower convergence [20 , Chapter 5]. The bonus 
level is required to be positive, and cannot surpass the maximum bonus level, B (line 6 in Algorithm [1}. The 
algorithm stops when \T — Q t \ < e, where e > is the tolerance parameter (line 3). Under the assumption 
that the system parameters (see Tabled]) are fixed, 

Proposition 6.2. The sequence of bonus levels {-Bt}^ converges to the optimal solution of the Publisher Problem 
with probability one. 

According to Proposition 16.21 the convergence of Algorithm [1] is assured when the system parameters 
are fixed. In order to 1) study the behavior of our algorithm when the population size varies and to 2) 
investigate how the convergence speed may be affected by the correlation among users, we conducted trace- 
driven simulations, whose results are reported in Appendix ID.3I Next, we use Figure [5] to illustrate some of 
our findings. 

Figure [5] shows a sample path of our trace-driven simulations. We let a = 1, M = 30, which corresponds 
to 2:30 hours (see Figure [2ja)), r = 100, which corresponds to up to 3 updates per day, p = 0.54 (see 
Figure HJb)), G — 0.4, P = B = 40 and T = 11. The number of users is initially 50, and decreases to 
20 at round 200. We assume that users solve the User Problem when setting their activation strategies, 
and Algorithm Q] is run by the publisher every 100 rounds. We consider half of the population in one bus 
and the other half in another (see Appendix ID.3[) . Despite the correlations among users, Figure [6] does not 
qualitatively deviate from the results obtained with uncorrelated users. In particular, the algorithm converges 
in up to 20 rounds, and the number of transmissions per slot varies between 9 and 11. The oscillations are 
not always centered at T = 11 due to the fact that the threshold adopted by the users is integer valued, 
which might prevent inequality (|16[) to bind. Note that the bonus level converges to values in the optimal 
range, obtained using Proposition HO] and marked with dotted lines in Figured When N = 20, the publisher 
sponsors the service provider costs (B = P), and users make their updates for free. 

7 Discussion of Assumptions, Limitations and Future Work 

Next, we discuss the main simplifications adopted to yield a tractable model (Assumption [301 is discussed in 
details in SJH so we do not include it in the following list). 

Frequent decisions assumption: We assume that users are interested in maximizing their expected 
average rewards. This assumption is appropriate if decisions are made frequently. In our measurement study, 
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we assumed that users make decisions every 5 minutes, and we observed that for a vast number of bus shifts 
the system parameters are stable over hours. 

High load assumption: We assume that updates are issued every time slot with high probability. 
As discussed in ij2.3.11 this assumption holds for certain news feeds categories such as top stories. The high 
load assumption also holds if users synchronize the updates of multiple applications, using APIs such as 
OVI Notifications [15 . In this case, the larger the number of applications that require updates, the higher 
the chances of at least one update being available every time slot. If multiple application updates can be 
performed at a single time slot, the bundle being subject to no additional costs or bonus, our model holds 
without modifications. However, if the interaction among the applications is non trivial, new policies need 
to be devised accounting for decisions such as when and from whom to download updates from each of the 
applications (see JJ5]for complementary work on that topic). 

Self-regarding users assumption: We assume that users do not collaborate with each other. Although 
researchers are interested in leveraging collaboration among users [211 [22] , a large number of mobile systems 
still does not take advantage of peer-to-peer transfers, and users need to download their messages exclusively 
from access points or base stations |23j . Our model applies to such systems. Nevertheless, we envision that if 
peers are roughly uniformly distributed, peer-assisted opportunistic transfers can be easily captured by our 
model simply by adapting the contact probability, p, in order to account for peer-to-peer contacts. Future 
work consists of accounting for spatial information when modeling the likelihood of contact opportunities. 

Finally, note that in this paper we do not consider prediction algorithms in order to infer future opportu- 
nities with WiFi access points. In particular, some of the algorithms without prediction previously proposed 
in the literature, such as j3j §6.1.1], are special instances of the aging control policies described in this paper. 
As shown in 21 neglecting the possibility of prediction comes with no significant loss in scenarios of practical 
interest, the incorporation of prediction algorithms in our framework being subject for future work. 

8 Related Work 

The literature on measuring [11 [Ml 123], modeling [331 US (Ml (23 HE] and control [H [301 El 121 131 121 [Mj 

in wireless networks is vast. Nevertheless, we were not able to find any previous study on the aging control 
problem as described in this paper. Previous work accounted for the modeling of aging [25) or for the age 
control by publishers [22] 131] , but not for users aging control as described in this paper. We were also not 
able to find any previous study on the the implications of bonus packages set by content and/or service 
providers [5j. 

Chaintreau et al. [25j model the distribution of message ages in a large scale mobile network using a 
spatial mean field approach. Their model allows the analysis of gossiping through opportunistic contacts. In 
this paper, in contrast, we assume that nodes rely exclusively on base stations and access points in order to 
receive their updates. 

Activation control strategies were proposed in [351 [531 1301 HI] • In [2H] the authors consider publishers of 
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evolving files, that aim at reducing their energy expenditure by controlling the probability of transmitting 
messages to users. In |30j . a joint activation and transmission control policy is proposed so as to maximize 
the throughput of users under energy constraints. In |11) . a joint activation and link selection control policy is 
proposed so as to minimize the energy consumption under delay constraints. Our work differs from [29, 30, 11 
in two ways as 1) we investigate the activation policy of mobile nodes based on the utility of their messages 
and 2) we study the publishers bonus strategy. In addition, our analysis is carried out using MDPs and is 
validated using UMass Dieselnet traces (similar methodology is adopted, for instance, by Yang et at [36 ). 

The utility function introduced in this paper corresponds to the impatience function presented by Reich 
and Chaintreau |31j . Reich and Chaintreau |31) study the implications of delays between requests and services 
on optimal content replication schemes. If users have limited caches and cannot download all the requested 
files every time they are in range of an access point, the insights provided by [23 need to be coupled with 
the ones presented here in order to devise the optimal joint activation-replication strategy. Therefore, aging 
control, as described in this paper, significantly complements replication control, as described in [3 1 1 137) . 

9 Conclusion 

This paper reports our measurement and modeling studies of aging control in hybrid wireless networks. From 
the DieselNet measurements, we learned that correlations among contact opportunities do not play a key 
role if the energy costs incurred by users are small or if users cannot discriminate their strategies based on 
bus shifts. We then modeled and solved the aging control problem, and used trace-driven simulations to 
show that a very simple threshold strategy derived from our model performs pretty well in practice. When 
publishers are strategic, we analyzed the bonus package selection problem, and showed that in some scenarios 
it is beneficial for publishers to fully sponsor the content updates requested by users. We believe that the 
study of mechanisms to support applications that require frequent updates, such as microblogging, in wireless 
networks, is an interesting field of research, and we see our paper as a first attempt to shed light into the 
tradeoffs faced by users and publishers of such applications. 
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A Derivation of Main Results When P^g — > oo 

We begin analyzing the case in which 3G is not available. The optimality conditions are 



V{x) + g u = max (U{x)+V(mm(x + 1, M)), U(x)-G+p(V(l)-P+B)+(l - p)V(min(x + 1, M)j\ , x=l, 



M 



Let 



H(x,0) = U(x) + V(min(x + 1,M)) (18) 
H(x,l) = U(x)-G+p(V(l)-P+B)+(l-p)V(mm(x + l,M)) (19) 

It follows from (fT7)) -(fl"9 |) and [H] that the following policy is optimal, for m = 1, . . . , M, 

jo, i£H(m,0)>H(m,l) 
a(m) = < (20) 
I 1, otherwise 

At states M and M — 1, (jTTJ) implies that 

(H(M,0) > H(M,1)) A (H(M - 1,0) > H(M - 1,1)), if V(M) > [7(M)-G+p(F(l)-P+B)+(l -p)V(M) 
(H(M, 0) < i/(M, 1)) A (H(M - 1, 0) < H(M - 1, 1)), otherwise 



Equation (|21l) yields the following remark, used in the analysis of the base cases of the inductive arguments 
that follow. 

Remark A. 1. {H(M, 0) > H(M, 1)) (H(M — 1, 0) > H(M — 1,1)). 

A.l Derivation of (THZj) and corresponding steady state probabilities 

From gSU 

7Ti = 7Ti, i=l,...,s (22) 

TTi = ^-i(l-p)=^i(l-p) J - s , i = * + l,...,M-l (23) 

1 -p 

1"M = (1 -P)(^M-1 +7Tm) = ( 24 ) 

p 

Therefore, 

/ Af-l \ - 1 / M-l-s \ - 1 , _ . -i 

tti = ^ + E ( X - P)" S + (! - P) M ~ S /P) = + E f 1 " ^ + ( X " P) M_ VpJ = (* + -y 5 J 
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A. 2 Derivation of expression of E[r; s 
Next, we show that 

E[r; s]=7Ti 



s-l 



M-l-s 



G 



J2 u ( x )+ J2 u(i + s)(i- P y p+b 

nn 1 »_n P 



Let 



Then, 



M-l 



= (U(M)-G-pP+pB)n M - (G+pP-pB)m /m 



(25) 



(26) 



M 



s M = J2 u ^ 7Ti + J2^ u( - i ^ G ^p p+ p B ^'' = 7 ' 1 



i—l i—s 

"s-l M-l-s 



M-l 



(27) 



71"! 



5^(0 + 51 ^(* + s)(l-p)' 1 + -(l-p) M_s (f/(^)-G-pP+pB)-(G+pP-p J B) 



l-(l-p) M " s 



i=i 

s-l 



i=0 
M-l-s 



+ E C/(i + s)(l-p) i -G/p-P + J B 



A. 3 Proof of Proposition IA.1I 

Proposition A.l. The User Problem admits an optimal threshold policy 



(28) 



a(x)- 



0, if x < s* 

1, ifs*<x<M. 



(29) 



Proof. We show that, for m = 0, . . . , M — 1, 



V(M — to — 1) — V(M - m) > 



(30) 



(P(Af - to, 0) > P(M - m, 1)) (P(M - to - 1, 0) > H(M -m-l, 1)) 



(31) 



We consider two scenarios, H(M, 0) > H(M, 1) and (M, 0) < H(M, 1). 



scenario 1) H(M, 0) > iT(M, 1) 



(32) 



If H satisfies (J3TJ) then, from ([T7 | -([l9 | . 



V(m) = C/(to) + V(m + 1), 1 < to < M - 1 



(33) 
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and O holds. 

Next, assume for the sake of contradiction that if does not satisfy (j3"Tj) . Let m be the largest state at 
which condition (l3~Tj) is violated, 



m = max{i\H(M - i, 0) > H(M - i, 1) and H(M - i - 1, 1) > if(M - i - 1, 0)} (34) 
It follows from (JM) that 

if (M - m - 1, 1) > if (M - m - 1, 0) (35) 

H(M -m + k,0) > F(M-m + fc,l),fe = 0,l,...,m (36) 

([33|) and ([5d]) yield, respectively, 

V(M-m)-V(l) < -G/p-P + B (37) 

V(M-fc)-7(l) > -G/p-P + B, fc = 0,...,m-l (38) 



Letting fc = m — 1 in 

F(M - m) = f7(M - m) + V(M -m + l)> U(M - m) + V(l) - G/p -P + B (39) 
([3T|) and yield the following contradiction 

V(l) -G/p-P + B < V(M - m) < V(l) -G/p-P + B (40) 
Therefore, holds for m = 0, . . . , M - 1. 



scenario 2) H(M, 0) < ff (M, 1) 



(41) 



Base case: We first show that V(M - 1) > V(M). 

Note that (if (M, 1) > if (M, 0)) ^ (if (M -!,!)> if (M - 1, 0)) (see remarkUT]). It follows from (fTT|) 



that 



V(M-l) = C/(M - 1) - G- P + B+pV{\) + (1 -p)y(M) -g u (42) 
V(Af) = C/(Af)-G-P + B+pV r (l) + (l-p)y(Af)- ffu (43) 

Hence, (|32 jl -(|33 )l . together with the fact that U(x) is non-increasing, yield V(M - 1) > V(M). 

Induction hypothesis [assume result holds for m < t]: Assume that V(M— m— 1) — V(M— m) > 0, 
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for m<t, and (H(M - m, 0) > H(M - m, 1)) => (H(M - m - 1, 0) > - m - 1, 1)), for to < t. 

Induction step [show result holds for to = t]: Next, we show that V(M — i — 1) — T^(M — t) > 
and that (H(M - 1,0) > H(M - t, 1)) (iI(M - t - 1,0) > #(M - t - 1, 1)). 

It follows from the induction hypothesis that V(M — t + 1) < V(M — £). We consider two cases, 

i) H{M — t, 0) > H(M — t, 1) The proof is similar to that of scenario 1. 

ii) H(M -t,l)> H(M - t, 0) Now, we show that 

V(M - t - 1) - V[M - 1) > 0. 

Note that (ff(M - t,0) > H(M - t,l)) (H(M - t - 1,0) > H(M - t - 1,1)) holds vacuously. If 
(H(M -t,l)> H(M - t, 0)) and (H(M - t - 1, 1) > F(M - t - 1, 0)), 

V{M-t)+g u = U(M-t)-G-P + B+pV(l) + (l-p)V(mm(M -t+l,M)) (44) 
V(M -t-l)+g u = U(M-t-l)-G - P + B +pV{\) + (l-p)V(mm(M -t,M)) (45) 

Also, V(M -t+1) < V(M - t) (induction hypothesis) and U(M - t) < U(M - t - 1) (by assumption). 
Hence, (p])-(|g5]) yield V(M - t) < V(M - t - 1). 

□ 

Remark A. 2. In w/iat follows our analysis is restricted to threshold policies. 
A. 4 On the number of optimal thresholds 

The existence of policies satisfying (|17|) follows from [38] and [39]. In particular, the optimal threshold policies 
satisfy (jTTJ) . In general, though, the solution to (IT71) is not unique. Next, we characterize the scenarios in 
which there are two or more optimal threshold policies. 

Figure [7] illustrates scenarios in which Assumption IA.1I does not hold. Let b = 0.3, M = 21, p = 0.5 and 
U(x) — c if x < 3 and U(x) — otherwise. Figures [TJb), EJd) and [TJf) show the three utility functions 
considered, corresponding to c = 4, 12 and 16, respectively. If c = 4, threshold policies with s > 3 yield 
optimal reward of zero. If c = 12, there are two optimal thresholds (s = 2, 3). Finally, if c = 16 the optimal 
threshold policy is unique (s = 2). 

Proposition IA.2I establishes conditions according to which the number of optimal thresholds is at most 
two. 

Proposition A. 2. Let R be the number of optimal thresholds. 

j R > 3, if 3m < M-l s.t. Yl'x=i u ( x ) = G/p + P-B and U(x) =0 Vx > to 

< (46) 

I R < 2, otherwise 

If i? > 3 then the policy always inactive is optimal. 
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Figure 7: Illustration of scenarios in which the optimal threshold is not unique ((a) and (c)) and unique (e). 
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Proof. From proposition lA.31 the expected reward E[r; s] of a threshold policy is non-decreasing when s < s* , 
and is non- increasing when s > s*. Therefore, the optimal thresholds should be consecutive. Let s* — 1, s* 
and s* + 1 be three optimal thresholds, 

E[r; s* - 1] = E[r; s*] = E[r; s* + 1] (47) 

From g7D and (16^]) . 



s* - 1 + E[r; s* - 1]- (s* + i— £[r; s*] = -E[r; s*} =-p^ I7(i + s* - 1)(1 - p)^) 

1 — \ / 1 — \ M-s*-l 

s + — -J £[r;s*]- [s* + 1 + — E[r;s* + 1] = -£[r;s*] = -p ^ C/(i + - p)^ 1 (49) 
Subtracting (g5J| from 05]), 

U(s*) = (50) 
The above equation together with the fact that U (x) is a non-increasing function yields 

U(x) =0, x>s* (51) 

([ST]) together with (gSJ and (gHD yields 

E[r; s* - 1] = E[r; s*] = E[r; s* + 1] = E[r; M + 1] = (52) 
Substituting ([51]) and ([52]) into (J25j) yields 

s*-l 

Vf/(i) P + B = 0. (53) 

a: — 1 

Therefore, if ([53]) and ([51]) hold then R > 3. Otherwise, i? < 2. 



□ 

Corollary A.l. If the policy always inactive is not optimal, the number of optimal thresholds is at most 
two. 

Corollary A. 2. If the optimal reward is greater than zero the number of optimal thresholds is at most two. 

Corollary A. 3. Let s* be an optimal threshold. If E[r; s*] ^ E[r; s* + 1] and E[r; s*] ^ E[r; s* — 1] then the 
optimal threshold is unique. 

Given the above characterization of the cases in which the optimal threshold policy is not unique, in the 
rest of this Appendix we consider the following assumption. 

Assumption A.l. There is at most one optimal threshold policy. 
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A. 5 Proof of Proposition 15.11 

Proof. Let a be the optimal threshold policy. 

A. 5.1 Conditions for the optimal policy to be always inactive 

It follows from the optimality conditions (fTTj) and Assumption IA.1I that 

a[x) = 0, 1 < x < M H(x, 1) < H(x, 0), 1 < x < M (54) 
Therefore, from the above equation and (fT8)) - ([T9"l) . 

a(x) = 0, 1 < x < M V(l) - V(x) < G/p + P — B,l < x < M (55) 



Since V(x) is decreasing (see Proposition lA.l[) . (|55|) yields 

a(x) = 0, 1 < x < M ^ 7(1) - V(M) <G/p + P-B (56) 

or equivalently, 

a(x) = 0, 1 < x < M <=^> V(M) > V{1) -G/p-P + B (57) 

From (TTT)) . 

a{x) =0,l<x<M <t=^> V{i) = U(i) + V(i + 1), 1 < i < M - 1 (58) 



where <^ in (|58|l follows from the assumption that there is a unique optimal policy that satisfies (fT7|) . 
Therefore, 

M 

a(x)=0,l<x<M V(j) = ^2U(i),j = l,...,M (59) 



Finally, §7$ and O yield 

M 

a(x) = 0, 1 < x < M ^ 0>Y^ U{i) -G/p-P + B (60) 

i=l 

A. 5. 2 Conditions for the optimal policy to be always active 

It follows from the optimality conditions (fTTj) and Assumption IA.1I that 

a{x) = 1, 1 < x < M H(x, 1) > H(x, 0), 1 < x < M (61) 
Therefore, from the above equation and (jT8j)- (Jl9j) , 

a(x) = 1, 1 < x < M V(l) - V(x) > G/p + P - B, 1 < x < M (62) 
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Since V(x) is decreasing (see Proposition lA.l[) . it follows from (|6"2"|) that 

a(x) = 1, 1 < x < M ^> V{1) - V{2) >G/p + P-B (63) 

or equivalently, 

a(x) = 1, 1 < x < M <=> V(2) < V(l) -G/p-P + B (64) 
Letting x = 1 in (|T7|) yields 

a(x) = 1, 1 < x < M V{1) = -G + U(l) + P V(l) - pP + pB + (1 - p)V(2) - E[r, 1] (65) 

where <= in (|65p follows since our analysis is restricted to threshold policies. Note that a(l) = 1 is implied 
by the right hand side of ([65]) together with §T7§. If a(l) = 1 then a(x) = 1, 1 < x < M. 
Finally, dHJ) and (ES]) yield 

V(l) = -G + U{1) + P V{1) - P P+pB + (l- p)V(2) - E[r\ 1] <S=^ (66) 
<S=^ V(l)<-G + U{l)+pV{l)~pP+pB + (l-p)(V(l)-G/p-P + B)-E[r;l} (67) 

The desired result follows from algebraic manipulation of (p7|) . 

a{x) = 1, 1 < x < M ^ E[r; 1] < U(l) - G/p - (P - B) 



□ 

Remark A. 3. // Assumption \A.l\ does not hold, the proof of Proposition ^. 1\ presented above remains valid 
after replacing all <^=> by =>. 

A. 6 Proof of Proposition 15.21 

The proof of Proposition 15.21 follows directly from Proposition IA.3I 

Proposition A. 3. If s < s* — 1 then E[r; s] < E[r; s + 1] . Otherwise, E[r; s] > E[r; s + 1] . 

Proof. Next, we show that if s < s* — 1 then E[r; s] < E[r; s + 1] (the other case follows similarly). Let s* 
be the optimal threshold. It follows from Definition 13.11 that E[r; s] — E[r; s*] < for 1 < s < M. Next, we 
show that E[r; s* — m] > E[r; s* — (m + 1)] for m < s* . The proof is by induction on m. 
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Algebraic manipulation of (|23|) yields 



s- 1 



1-P 

M- 



E[r;s- 1]- s 



1-p 



G 



E[r; s] 



U{i)+ J2 U ( i + s ~ !)(! -P) l --- p + B 



1=1 j=0 

M-s 



Af- 



5] tf(t)+ J! + « - 1)(1 - pf~ l - ~-P + B 

A — 1 A — 1 P 



= -pj^uii+s-m-py- 1 . 

i=l 

Equation (|69|) yields 

(£7[r;s - 1]-E[r;s]) = E[r;s] -p £ U(i + s)(l - pf' 1 



(69) 



1-p 



M-s 



(70) 



Base case: It follows from Definition 13.11 that the statement holds for m = 0. Note that E[r; s* — 1] 
£[r; s*] < yields 



(*) 



s* - 1 + i-^ j (£[r; s* - 1] - £7[r; s*]) 
^ £[r;s*]-p ^ I7(i + a *_i)(i_p)<-i 



! * + ^) ( £ [»-;**-l]--B[r; a *]) 

M-s* 

£[r; s* - 1] - j3 ^ U(i + s* - 1)(1 - p) 1 " 1 < 

i=l 

M-s* 

E[r; s* - 1] < p u (i + s * ~ - p)^ 1 



(71) 
(72) 
(73) 



(74) 



i=l 



where (*) follows from (j2"5)l and (**) follows from (*) after summing E[r; s* — 1] — E[r; s*} to both sides. 
Induction hypothesis: Assume that E[r; s* — m] > £?[r; s* — (m + 1)] for m < t. 
Induction step: We show that the proposition holds for m = t. To this goal, we compare E[r; s* — t] 
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and E[r;s* - (t + 1)], 



e*-t+ ^— ^ (E\r, s* - t] - Mr; s*-t- 1]) 
P J 

M-s* +t 

~ P U{i + s* -t-l^l-pf- 1 -E[r;s* -t] 

i=l 
M-s* 

> p U{i + s* -t- 1)(1 -p) 1 - 1 - E[r;s* - t] 

i=l 
M-s* 

> p U{i + s*)(l - p) 1 " 1 - E[r; s* - i] 

8=1 

( * * ) M — s * ( * # * ) 

> p ^ C/(i + s*)(l -Mr;s* - 1] > 



i=l 



where (*) follows from ([701) . (**) follows from the induction hypothesis and (***) follows from ([71)1 . The 
proof is completed by noting that s* — t + -^2 > hence i?[r; s* — i] > E[r; s* — t — 1]. 

□ 

A. 7 Proof of Proposition [5731 

Proof. In what follows we show that the optimal threshold increases with respect to G. The proof that the 
optimal threshold increases with respect to P and decreases with respect to B is similar. From 



^E[r;s} = -± (75) 

dG ps + 1 — p 



Let si > sq- Then, 



£[r; Sl ,G + AG]-E[r; Sl ,G] gjV; s , G + AG] - Mr; s ,G] 

a™o AG agSo AG 

From (|28[). it also follows that 

1 AG 

E[r; Sl G + AG]-E[r ]Sl G] = — - (77) 

s+(l-p)/p p 

Therefore, 

E[r; si, G] - E[r; s\,G + AG] < Mr; s Q , G] - E[r; s , G + AG] (78) 

Assume, for the sake of contradiction, that Si and So are optimal thresholds when the energy cost is G 
and G + A, respectively, 

E[r;s u G] > E[r; s,G], s ± s x (79) 

Mr; s , G + AG] < E[r; s,G + AG], s ^ s (80) 
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In particular, 



E[r;s l7 G] > E[r;s ,G] 
E[r;s ,G + AG] < E[r; s u G + AG] 



(81) 
(82) 



Then, E[r; si, G] - E[r; s u G + AG] > E[r; s Q ,G] - E[r; s , G + AG], which contradicts the ((78 



□ 



Remark A. 4. In (|78[) . < should be replaced by < if Assumvtion Vft+i\ does not hold. In this case, the optimal 
threshold in non decreasing with respect to G. 



A. 8 Proof of Proposition 15.41 (step utility) 

Proof. We assume that the optimal policy consists of remaining active in at least one state (otherwise 
s* = M + 1 and E[r; s] = 0). We consider two cases, 
i) k > s: 

E[r;s] = 



1 



vk 



G 



P + B 



(83) 



ii) k < s: 



E\r\ s\ 



i-p 



vs + vS^fl -pY P 

/ - ■ p 



l-p 



vil-p) v(l-p) k - s+1 G 

vs + -± ^ - ^ P + B 

p p p 



We now show that in the above two cases the optimal threshold can be efficiently computed by comparing 
the values of E[r;s] at five points. To this goal, we first assume that the optimal threshold, s*, can take 
real values (the assumption will be removed in the next paragraph) . We refer to s* as an interior maximum 
if 1 < s* < M , and as a boundary maximum if s* = 1 or s* = AI . Then, a necessary condition for s* to 
be an interior maximum of E[r; s] consists of s* being a root of 4-E[r\ s] =0. If k > s then -^E[r; s] = 
has no roots in the interval [1, M], since (f83|) is monotonic with respect to s. If k < s, let <p be the root of 
j- s E[r\s] = 0. Then, 



9 



p+W (G + pP -pB) exp 



ln(l -p) (1 + kp)+p 



P 



p + \n(l-p)(l-p) / (ln(l 



In (|84|) . W(x) denotes the Lambert function, i.e., W(x) — w if we w — x. 

Accounting for the fact that s* is an integer, and that it might be either an interior or boundary maximum 
yields, 

1, if k > s* and vk - G/p- P + B > 

fc-1, if k > s* and vk - G/p- P + B < 

ip = min(|V|,M), if k < s* and E[r; i>]>E[r; tp + 1] 
min( [_(p\ , M) , otherwise 



(85) 
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□ 



Therefore, s* can be computed comparing the value of E[r; s] at s = 1, k — 1, \ tp\ , \<p\ , M. 

A. 9 Linear utility 

When U (x) = M — x, the expected reward is 

E[r; s] = Us - 1)M - s(s - l)/2 + ^ - P s - 1 + P _G _ p + B 

L p z p 

A. 10 Expected age 

If p > 0, the expected age, A, is A — J2iLi 



M—X /-. \M—s 



i=l i=s+l ^ 
A/ s M-l 



^ P s + ^ s+(l-p)/p .f^ s+(l-p)/p 

pV~p 2 S -2 (l-p) M - s (l-p) + 2sp + 2-2p 
2p (sp + 1 — p) 

A. 11 Proposition IA.4I 

Proposition A. 4. The expected age is an increasing function of s. 

Proof. Let A(s) be the average age of a threshold policy whose threshold is s. Consider two threshold policies, 
with corresponding thresholds s and s + 1. Then, 

A(S) - ^ s + i^ + .^ .+ 1-2 + p , + !_£ (87) 

i=l p i=s+l p P 

1=1 p i=s+2 P P 

Algebraic manipulation of (l87)) - (l88l) yields 



am =y — ^+ y - i ■ 



(89) 



(90) 
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Subtracting the left-hand side of (|9T)f from the left-hand side of (jHH]), and denoting the difference by Ai, 
yields 



i l-p 

s -> 



A: = A(s) p • A(s + l)(l-p) 
s + 1 + p 

= A(s) - A(s + 1) - A(s) 1 , + A( s + l)p 

s + 1 H 

p 

= (A(*)-A(« + l))(l-p)+A( fl )— (91) 

s + 1 H - 

p 

Subtracting the right-hand side of from the right-hand side of (|59")h and denoting the difference by A2, 
yields 



^Etttts- (92) 

Letting A x = A 2 , 



(A( a ) - A(s + 1))(1 - p)= -^-^ ( - E* " J ( 93 ) 

Next, we make the dependence of the age on p explicit, and denote it by A(s,p). 
The age is minimized when p = 1, 

A(s,l) < A(s,p), 0<p<l (94) 

Letting p = 1 in (|86) . 

4( S) l) = i^i (95) 

i=l 

(|95|) implies that the right hand side of (f93|) is negative. Therefore, it follows from (|93| - (|95|) that A(s) < 

A(s + 1) and, for si < s 2 , A(sx) < A(s 2 ). □ 

A. 12 Proof of Proposition 16.11 

Proof. Let / be a function that maps a bonus level into the corresponding optimal threshold selected by 
users that solve the User Problem, 

f:B -> s* (96) 

R ->• N (97) 

Let j be a function that maps a threshold s into the minimum bonus level B such that s is the optimal 
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threshold under B, 



g:s -> B (98) 
N -> R (99) 

Let /i be a function that maps a threshold s into the pair of bonus levels (B(s\ B(s)) such that s is the 
optimal threshold under any bonus in the range [B_(s), B(s)], 



h:s (B(s),B(s)) (100) 

N -> M 2 (101) 

Note that / admits no inverse (changing B might not alter s*). The function <?, in contrast, is injective, 
since given B the optimal threshold s*(B) is unique. 

The publisher must choose B so as to minimize A. We consider two cases, varying according to the value 
of s(B) evaluated at B = P, 

case i) 

MP) < % - 4* 

T p 

In this case, inequality (fT6j) is not satisfied when B = P. Since i? cannot surpass P, the problem admits 
no solution. 



8( P)>f -1^ 

T p 

In this case, the problem admits a solution, to be derived next. 

According to Proposition ! A. 4l the average age A is an increasing function of s. According to Proposition 
15. 31 the optimal threshold s* is a non-increasing function of B. Therefore, the objective function A of the 
Publisher Problem is a non-increasing function of B. 

The fact that A is a non-increasing function of B, together with the fact that the l.h.s. of inequality (TTB]) is 
increasing in B, imply that the optimal bonus consists of setting B as large as possible, without violating (|16[) . 

Inequality (I16[) binds when 

Np 1 - p , . 

s = 102 

T p 
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Therefore, the optimal threshold, which must be integer, is 

~Np 1-p 



s = min \ M + 1 



T 



P 



(103) 



The solution of the Publisher Problem consists of finding B such that s* = s. 

The bonus B must be chosen in the range \B_(s*), B(s*)]. Since s* is a non-increasing function of B 



(Proposition I5.3|) . the value of B which yields s* = min ( M + I, 
search algorithm. 



Np 
T 



1-p 



can be found using a binary 



□ 



B Additional results when P^g — > oo 

B.l Matrix form and linear program 

(l 1 

1 -1 
1-1 



-4, 



/ -p+l 

-P 
-P 



V 

(1-p) 

1 





1 -1 



1 -(1-p) 



1 -1 

0/ 



(104) 



1 -(1-P) 



-P 

V -p 

Let V be the solution of the optimality equations. 



1 -(1-P) 
-P ) 



(105) 



A X V > U 

A 2 V > U-G+pP-pB 



(106) 
(107) 
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Letting A\ and Ai denote matrices A\ and A-i with their last line and column removed, yields the following 
LP 

min \V\ (108) 

v > 2TV (109) 

V > A^(U -G + P P-pB) (110) 
where V is the vector V with its last element removed. 

C Derivation of Main Results Accounting for 3G and WiFi 

We now analyze the case in which 3G is available. The optimality conditions are, for 1 < x < M, 



V(x) + g u = max (u(x)+K(x), U(x)-G+pV - P )K(x)-pP+ P B, U(x)-G+V(l)-pP-(l - p)P 3 g+b) 



11 



where K(x) = F(min(M, x + 1)). 

Let F{a,x) be the relative reward at state x when action a is chosen, plus g u , 

F(0,x) = U(x) + V(mhx(M,x + l)) (112) 
F(l,x) = U(x)-G+pV(l) + (l-p)V(mhx(M,x + l))-pP+pB (113) 
F(2,x) = U{x)-G + V(l)- P P-{l-p)P 3G + B. (114) 

The optimality conditions are, for x = 1, . . . , M, 

V(x) = max (F(0, x) - g v , F{\, x) - g v , F(2, x) - g u ) . (115) 

Let J- (a, x), a — 0, 1, 2, and x = 1, . . . , M, be boolean variables such that 

J 7 (0,x) = T o F(0, as) > max(F(i, a:)) (116) 
•F(l,a;) = T o x) > max(F(i, x)) (117) 

i 

J r (2,x) = T ^> F(2,a;) > max(F(i,x)) (118) 

(119) 

From (|112p - (|114p . a policy that selects o(x), x = 1, . . . , M, as follows, is optimal 
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0, if 7(l)-V(min(M, x + 1)) < G/p + P - B and V(l) - V(min(Af, x + 1)) < G + pP + (1 - p)P 3G -B 



where ties are broken arbitrarily. 
C.l Proposition EED 

Proposition C.l. V(x) is a non-increasing function. 

Proof. According to (|120[) -(ii). if P 3 q < G/p + P, there exists an optimal policy in which action 1 is not 
selected. Therefore, in what follows we separately consider two scenarios, P 3G < G/p+P and P^q > G/p+P. 



In this scenario, there exists an optimal policy wherein a{x) ^ 1, x = 1, . . . , M. We show by induction 
that 

• V(M - i - 1) - V{M - i) > for i = 0, . . . ,M - 1; 

• T{M — i,0) = T F{M - i - 1, 0) = T for i = 0, . . . , M - 1. 

(|120[) (i) and (|120p fiii) yield the following remark, used in the analysis of the base cases of the inductive 
arguments that follow. 

Remark C.l. JF(M,0) = T F{M - 1,0) = T. ■ 
We consider two cases, F{M, 0) = T and J-(M, 2) = T. 

• J"(Af,0) = T: 

Base case: We hrst show that V(M - 1) > V(M). It follows from (fTT5]) and remark ICTTI that 



1, if V(l)-V(min(M,a; + l)) > G/p + P -B and V(l) 

2, if V(l)-V(min(M, x + l))> P 3G - B and V(l) 



V(min(M, x + 1)) < P 3G - B 

V(min(M, x + 1)) > G + pP + (1 - p)P 3G -B 

(120) 



scenario 1) P 3G < h P 

P 



(121) 



V(M-l) 



U(M - 1) + V(M) - g.. 



'a 



(122) 



V(M) 



U(M) + V(M) - g. 



'u 



> U(M) + V(l) - pP - G - (1 - P )P 3G + B- g„ 



(123) 
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Hence, (|T52 ]I -([1"25 |) together with the fact that U(M) is non-increasing and U(M) = yield 
V(M - 1) > V(M) > V(l) - pP - G - (1 - p)P 3G + B- g u . 

Induction hypothesis: Assume that V(M — m— 1) — V(M — m) > for m < t and that .F(M — m, 0) 
. for m < t. 

Induction step: Next, we show that V(M -t-1)- V[M - t) > and that T{M - t - 1, 0) = T. 
It follows from the induction hypothesis that 

V(l) —pP — G+(l— p)P 3G + B < V(M -t+l)< V(M - t) (124) 

and F{M -t,Q) = T. Therefore, 

V(l) - V(M -t)< V(l) - V(M - 1 + 1) < pP + G + (1 - p)P 3G - B (125) 

The rightmost inequality in (|125[) and optimality conditions (|115l) imply that J-(M — t — 1,0) = T, 
which yield 

V(M -t-l) = U(M - 1 - 1) + V(M -t)-g u (126) 
V(M-t) = U{M-t) + V(M-t + l) -g u (127) 

Therefore, V(M -t + 1) < V(M - t) (induction hypothesis) and U(M -t)> U(M -t-l) together 
with (P51) - (fT2T)) yield V{M - t) < V(M -t-l). 

F{M, 2) = T: 



Base case: We first show that V(M — 1) > V(M). It follows from the optimality conditions (|115|) 
and remark [C. II that 

V{M-1) = U(M -l) + V(l)-G-pP-(l-p)P 3G -g u + B (128) 
V(M) = U(M) + V(l)-G-pP-{l-p)P 3G -g u + B. (129) 

Hence, QTZB -(ESI) and U(M) < U(M - 1) yield V(M - 1) > V(M). 

Induction hypothesis: Assume that V(M — m — 1) — V(M — m) > for m < t and that F(M — 
m + 1, 0) = T J"(M - m, 0) = T, for m < t. 

Induction step: Next, we show that V(M -t-l) - V(M - t) > and J"(M - t, 0) = T 
7-"(M — t — 1, 0) = T. We consider three cases, 
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— J-{M — t, 0) = T This case is similar to the corresponding one for T(M, 0) = T. 

- F{M - t, 2) = T and F{M -t -1,2) — T Next, we show that V(M - t - 1) - V(M -t)>0. 



V(Af-t) = C7(M-t) + F(l)-G-pP-(l-p)P 3 G-ff« + 5 (130) 
V(M-t-l) = U(M-t-l) + V(l)-G- P P-(l-p)P 3G -g u + B. (131) 

Since U(M - 1) < U(M -t-l), it follows from (fT30|| - (fl3T|) that F(M - t) < V(M -t-1). 
- T(M - t, 2) = T and J"(M - t - 1, 0) = T Next, we show that V(M -t-l)- V(M -t)>0. 

V(M-t) = U(M -t) + V(l)-G- P P-{l-p)P 3G -g u + B (132) 
V(M -t-l) = U(M -t-l) + V(M - t) (133) 

Since U(M - 1 - 1) > 0, it follows from (fOg)) that V(M - t) < V(M -t-l). 



G 

scenario 2) P 3 g > h P 

P 



(134) 



We show by induction that, for i = 0, .., M — 1, 

• V(M-i- 1) > V(M-i), 

• T(M - i, 0) = T => J"(M — i — 1, 0) = T 

• J"(M — z, 1) = T => (J"(Af - i - 1, 1) = T or J"(Af — i — 1, 0) = T) 

We consider three cases, J"(M, 0) = T, T(M, 1) = T and J"(M, 2) = T. 

(I120j) (i). (|120j) fii) and (|120j) (iii). together, yield the following remarks, used in the analysis of the base 
cases of the inductive arguments that follow. 

Remark C.2. F(M, 1) = T ^> T{M - 1, 1) = T. ■ 

Remark C.3. F(M,2) = T T{M - 1,2) = T. ■ 

• T(M, 0) = T: The proof is similar to that of scenario 1. 

• T(M, 1) = T: 

Base case: It follows from remark [C. 2 1 and (|112l) - (|115p that 

V(M-l) = U(M - 1) +pV{l) -pP + pB - (1 -p)V(M) -g u (135) 
V(M) = U(M)+pV{l)-pP + pB-(l-p)V(M)-g u . (136) 
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Hence, inequalities in (|135[) - (ll36p yield 



V(M - 1) > V(M) 

Induction hypothesis: Assume that, for m = 0, ,.,t — 1, 

a) V(M - m - 1) > V(M - m) 

b) F{M — m, 0) = T =>■ J*(M — m — 1, 0) = T 

c) J"(M - to, 1) = T (.F(M — m — 1, 1) = T or J"(M - m - 1, 0) = T) 
Induction step: Next, we show that 

a) V(M - t - 1) > ^(M - t) 

b) J"(M - <, 0) = T => J"(Af — t — 1, 0) = T 

c) J"(M - 1, 1) = T => (J"(M — t — 1, 1) = T or J"(M - i - 1, 0) = T) 

It follows from the induction hypothesis that V(M — t + l)< V(M — t). We consider three cases, 

- T[M — t,Q) = T The proof is similar to the corresponding one for H(M, 1) < H(M,0) in the 
proof of Proposition lA.il 

- F(M — t, 1) = T The proof is similar to the corresponding one for H(M, 1) > H(M,0) in the 
proof of Proposition lA.il 

- T(M — t, 2) — T Due to the induction hypothesis (cases b) and c)), one of the two cases above 
holds. Hence, it is not necessary to consider the case T(M — t, 2) = T. 

• F{M, 2) = T: 

Base case: It follows from remark [U31 that V(M - 1) < V(M). 
Induction hypothesis: Assume that, for m = 0, .., t — 1, 

- y(Af - m - 1) > V(M - m) 

- F{M - m, 0) = T J"(M — m — 1, 0) = T 

- J"(M — m, 1) = T =>■ (J*(M — m — 1, 1) = T or 7"(M - m - 1, 0) = T) 
Induction step: Next, we show that 

- V(M -t-l)> V(M - t) 

- T(M - t, 0) = T => T(M — i — 1, 0) = T 

- T[M - t, 1) = T => (J"(M - t - 1, 1) = T or J"(M — t — 1, 0) = T) 
We consider the following cases. 
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— J-(M — i, 0) = T The proof is similar to that of scenario 1. 

- F(M — i, 1) = T First, we show that 



{F{M - t - 1, 1) = T) or (T(M - t - 1, 0) = T) (137) 
For the sake of contradiction, assume 

— t — 1,2) = T) and ((T(M - t - 1, 0) ^ T) and (J"(M - t - 1, 1) ^ T)) (138) 

Then, 

F(M-t- 1,2) > F(M-t- 1,1) (139) 

F(M-t, 1) > F{M-t,2) (140) 

Replacing (|TT2 l) - ([lT4l) into (HMJ) - dHOJ) , 

-G + F(l)-pP-(l-p)P 3G + 5 > _G + pV(l) + (l-p)V(M-t)-pP-Kl>fl) 
-G + pF(l) + (l-p)y(M-t + l)-pP+pP > -G + V r (l)-pP-(l-p)P 3 G + S (142) 

From (|14ip and (I142p we obtain, respectively, 

V(l)-V(M-t)-P 3G + B > (143) 
-V{1) + V(M-t + 1) + P 3G - P > (144) 

Applying the induction hypothesis to (|143[) . 

^(^-^(M-t+^-PsG+P > (145) 

Therefore, from (Till]) and ([14"5]) , 

T/(M-i + l)+P 3G -P = K(l) (146) 

(fTiSl) together with (fn2>(TTT4]) imply that both F(M - t + 1, 2) = T and J"(M - t + 1, 1) = T 
are optimal, which contradicts (|138[) . 

To show that V(M — t — 1) > V(M — t), an argument similar to the corresponding one for 
H(M, 1) > H(M,0) in the proof of Proposition lA.il can be applied. 

JF(M — t, 2) = T We consider three subcases. 

* J-(M — t, 2) = T and F(M — t — 1, 2) = T The proof is similar to the corresponding case in 
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scenario 1. 

* J-(M — t, 2) = T and F(M — t — 1, 0) = T The proof is similar to the corresponding case in 
scenario 1. 

* F{M - 1, 2) = T and T(M -t- 1,1) = T In this case, it follows from F(l, M - t - 1) > 
F(2,M -t-l) that 

-G + pF(l) + (l-p)F(M-t)-pP+p£ > -G + V(l) -pP- (1 -p)P 3 G + -6(147) 
From JUT}, 

pV(l) + (1 - p)V(M - i) + pB > V(l) - (1 - p)P 3G + B (148) 

Therefore, 

U(M -t-l)+ pV(l) + (1 - p) V (M -t)+ P B> U(M -t) + V(l) - (1 - p)P 3 G + B (149) 
pl9]) together with ([TT2 |) -([TT4 | imply that V(Af - t - 1) > V(M - i). 

□ 

C.2 Proposition SH] 

Proof. The proof follows from Proposition IC.ll and optimality conditions (I120p . □ 

C.3 Proposition IC.2I (conditions for the optimal policy to consist of using only 
one action) 

Proposition C.2. Let a(x) = a, x = 1, . . . , M be an optimal policy. 
IfP3G <J + P and 



a = 2, 



• a = 0, 



If Pug >± + P and 



• a = 0, 



U(l)-U(2)>G+{l-p)P 3G + P P-B. (150) 



J2u(M-j)<G+(l-p)P 3G + P P-B. (151) 



M-l 

V U(M-j)<- + P-B. (152) 
i=i ^ 
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• 01 = 1, 



M-l 



U^-pJ^Um-py- 1 > (l- P )(G/p + P-B) (153) 
1=1 

M-l 

Y.U^il-py- 1 < P 3G -B. (154) 



• a = 2, 

U(1)-U(2)>P 3G -B. (155) 

Proof. The proof is similar in spirit to the one of Proposition 15. II □ 

Remark C.4. If the optimal threshold policy is unique, the conditions in Proposition I C. S\ are necessary and 
sufficient for the optimal policy to satisfy (|150[) - (I155[) . 

C.4 Optimal threshold properties with 3G 

To derive the optimal threshold strategy properties, we distinguish two scenarios: P 3 q < ^ + P and P 3 g > 

p 



G 

scenario 1) P 3 g < \~ P 

P 



(156) 



In this scenario, 



a(x) 



The transition matrix of the system, V Ul is 



0, if x < s 3G 
2, otherwise 



(157) 



(0 1 

1 
1 



Vu = 



V 1 











(158) 



Let 7r be the steady state solution of the system for a threshold 53c ~kV u = n. Then, 
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7Ti = 7T; = , i = l,...,S3G 

S3G 

TTi = 0, i = s 3G + 1, . . . ,M 



The expected age, A(s 3 g) — YmLi 1S 

A( s 3g) = 

The average reward of this threshold policy is 



(s 3 G + l)/2, if 1 < s 3G < M 
M, if s 3G = M + 1 





f ^ES^)-G-PP-(1-# 3G + B], 


if 1 < 


S3G < M 




[o, 


if s 3G 


= M + 1. 



The optimal threshold value Sg G is 

s* 3G = min J s\ d W - sU ( s + 1)>G + P P+{1- p)P 3G 



i=i 



G 

scenario 2) P 3G > hP 



In this scenario, let sw = s, 



if x < sw 
a(x) = { 1 if s w < x < s 3G 
2 otherwise 
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The case s^g = M + 1 was considered in Appendix 1X1 Next, we consider the case s^q < M. The transition 
matrix of the system, V u , is 



fo 1 

1 
1 



p 
1 
1 
1 



The steady state solution, n, of the above system is 



\ 



1 -p 



1 -p 






7Ti = 7r 4 , i = 1, . . . , s\y 

7Tj = 7Ti_i(l -p) = 7Ti(l -p) 4_Sw , t = % + 1, . . .,S 3G 
7T, = 0, i = S 3 G + 1, • • -,M 



A I 



= 1 

To obtain 7Ti, we sum (|168[) and (|169[) , 

S^TTl + 7Tl ^ ( 1_ 



i—sw + 1 



therefore, 



7Tl = 



1 + (1 - (1 - p) s 3G-Stt' + l)p-l ' 



The expected reward is 



(167) 



(168) 
(169) 
(170) 

(171) 



(172) 



(173) 



E[r; sw, s 3 g] = iri 



sw— 1 



E U(i)+ Y,( l -Pr sw U(i)-{G/p + P-B)-{Pz G -P-G/p){l-p) 



S3G-SW + 1 



(174) 



The optimal thresholds (s^s^o) are ( s w" s 3g) = ar g max (s w , S3G ){ £: [ r ' s w> s 3 g]} 
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C.5 On the number of optimal thresholds 

If there exists an optimal policy which consists of switching being inactive and WiFi, the results presented in 
HAA\ hold if we restrict to policies for which a(x) £ {0, 1} (x — 1, . . . , M). In contrast, if there is an optimal 
policy which consists of adopting 3G at state s^c < M, the optimal policy is in general not unique, since 
changing the actions chosen at states s > s^g does not compromise the optimality of the resulting policy. A 
characterization of the number of optimal two-threshold policies in this context is subject of future work. 
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D Learning algorithm 
D.l Proof of Proposition [6721 

In what follows, we assume that the Publisher Problem admits a solution B* , and that B* > 0. 
Proof. We index the discrete rounds of Algorithm [T] by n (n = 1, 2, 3, . . .), 

B n+1 <- mm(£ ) ma X (0, J B n + -(T-Q n ))) (175) 

Then, we prove a generalized version of Proposition 16.21 Namely, we consider the following algorithm, 
obtained from (1175[) after removing the max and min operators, 

B n+1 <- B n + -(T-Q n ) (176) 
n 

Next, we show that (|176p converges to the optimal solution of the Publisher Problem with probability 1. 
If B is sufficiently large, the convergence of (|176[) to the optimal solution of the Publisher Problem implies 
the convergence of (|175|) to such a solution as well (see [40j Lemma 3.3.8 and §5.4] for details). 

D.l.l Martingale definition 

Let M n+1 = E[Q n ] - Q n and g(B) = T — E[Q n (B)\. Then 

(X 

B n+1 <- B n + -(g(B n )+M n+1 ) (177) 
n 

The history of values {B rn } and {M m }, m = 1, . . . , n, yields a tr-field T n , T n = a(B m , M m , m < n). 

Note that {B m }, m = 2,...,n are fully characterized by B\ and {M m }, m = l,...,n. Therefore 
T n = cr(Bi,M m , m < n). 

Moreover, {M„}, n = 1, 2, ... is a sequence of zero mean random variables satisfying 

E[M n+1 \F n ] = a.s.,n > 0. 

As a consequence, {M n } is a martingale difference sequence with respect to the increasing a-ficlds J- n , 
n= 1,2,.... 

Recall that N is the number of mobiles in the network. Note that 

Qn < N (178) 
Therefore, |M„+i| = — Q n \ < N, thus {M n } is square-integrable, i.e., 

E[\M n+l \ 2 \F n ] is finite (179) 
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D.1.2 Differential inclusion definition 

Since the function g(B) = T — E[Q n {B)\ is discontinuous, we use differential inclusions, an extension of 
the ordinary differential equations, to approximate (I176p . The solution concept adopted for the differential 
inclusions is the Filippov solution [41) . . 

Let So be the number of discontinuity points of g in the interval (0, B]. Let B^' be the value of B at the 
i th discontinuity point of g(B), i = 0, 1, 2, . . . , s . Then, B^ = < B^ < ... < B^ = B (see Figure ©. 

Next, we approximate (|176[) by the following differential inclusion 

B(t) G F(B(t)) (180) 



where F is a set- valued map 

F : [0, B] -> subset of R 

F is defined as follows 

f {g{B)\, ftB^B®, i = l,...,s 

F(B) = { ^ (181) 

[ \g(B^),g{B^)], ifB = BW, i = l,...,s 

Since the function F is semi-continuous and, for all B(t), F(B(t)) is a compact and a convex set, there 
exists a solution to the differential inclusion (|180l) . In addition, it follows from Theorem 2 in [JT] that such 
a solution is unique. 

Let B* be the following solution to the Publisher Problem, 

B* = max j£« g > oj (182) 

Note that £ [g(.B w ) = .g(B*), .g(B( l+1 ))] and let 



T = max U < T 



U<Ti-— — ; r^=0^ (183) 

I " s B*)+ 1-v) h J v ; 



The basin of attraction of (|180[) is (see Figure [S]) , 



[B® =B*,Bl i+ % if T = T 

if T ^ f , where S(0 = B* 



We establish the global stability of the basin of attraction of (|180p in TO. 1.31 and then in ^D.1.41 we 
use [40j Theorem 2, §5] to show that the algorithm (|176j) converges to the basin of attraction of (|180j) . 
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D.1.3 Global stability of basin of attraction 

In order to establish the global stability of the basin of attraction of (|180[) . we use the following Lyapunov 
function 

V(B) = (B - B*) 2 (184) 

We begin by stating two definitions which generalize the notions of gradients and derivatives to encompass 
differential inclusions [4"2"] . 

Definition D.l. The Clarke generalized gradient ofV in B is defined as 

dV(B) = co{ lim AV(Bi) : (B t ) (B)} (185) 

l— >oo 

where co{A} is the smallest convex and bounded set containing A. 
Definition D.2. The set-valued derivative of V is defined as 

V(B) = {a e K : 3v E F(B) so that pv = a, Vp G dV{B)} 

It follows from the two definitions above that 

dV(B) = {2{B-B*)} (186) 

and 



V(B) = 



{2(B - B*)g(B)}, if B + B^\ j = l,...,s 

[2{B - B*)g(B^- 1 '>), 2(B - B*)g(B^)], if B = flW 



We divide the analysis of the stability of the basin of attraction of (|180l) into two cases, varying according 
to whether T^f (see Figure [S^b)) or T = f (see Figure ijc)) 

• If T + f (see Figure Hb)), V(B) < for B ^ B\ 

- B ^ B^ : If B > B* , then g(B) = T — E[Q n (B)} < since E[Q n ] is an increasing function of B 
(see Proposition [53]). If B < B* the result follows similarly; 

- B = B^ and B ^ B* : If B > B* , then g{B^~^) = T - E[Q n {B^~ x ^)] < since E[Q n ] is an 
increasing function of B (see Proposition 15.31) and B^^ 1 ' < S®. If B < B* the result follows 
similarly. 

• Using similar arguments, we can establish the stability of the basin of attraction when T = T (see 
Figure l[c)). Let ET = B* = B® and ~B* = B {i+1 \ Then, V(B) < for B [g(B^), g{B*)]. 
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Therefore, the global stability of the basin of attraction of (I180p follows from the Lyapunov theorem (see 

BSD- 




D.1.4 Algorithm convergence to differential inclusion 

Next, our goal is to show that (I176P is well approximated by (|180[) . To this goal, we define the linear 
interpolation of B n and establish the fmiteness of sup„ \B n \. 

Interpolation of {B n }: Let B(t) be the linear interpolation of B ni n = 1, 2, 3, . . . 

1 = E"=i a /i 

v^n+i , . — =s; 7t( B "+i - B n), t e (Ei=i a/t, E i= i a/0 

is a piecewise linear and continuous function. 
Finiteness of sup„ |-B„|.' Next, we show that sup„ \B n \ < oo. For B e [0, oo), ^(-B) is a compact and 
convex set and 

sup \y\<g(B^)<g(B^)(l + B) (188) 

yeF(B) 

Let ft c (B) be defined as in [40, §3.2-(A5)], h c (B) = g{cB)/c = (T — N/{s(cB) + (1 -p)/p))/c. Then, for 
B G [0,oo), 

fcoo(B)=0 (189) 
Therefore, similar arguments as those in [201 §3-2] applied to (I179p . (j!88|) and (|189l) yield 

sup \B n \ < oo (190) 

n 

Concluding the proof: Given that (| 190[) (which corresponds to [401 (5.2.2)]) holds, Theorem 2 in [301 §5] 
establishes that almost surely every limit point of (|187[) satisfies (|180l) (see [30] for details). As a consequence, 
since the sequence {B n }n=l,2,... generated by (|176l) is contained in (|187[) . it converges almost surely to (|180l) 
(see Corollary 4 of 40, §5] for details). □ 



D.2 Binary search and stochastic learning 

In this paper we proposed the use of binary search to obtain the optimal bonus level when the publishers 
have complete information (see Proposition l6.1[) and stochastic learning in face of incomplete information (see 
Proposition 16. 2[) . A variation of binary search, as proposed in [33], can also be used in face of incomplete 
information. However, we do not have a proof of the convergence of the modified binary search algorithm 
in our setting. Since the binary search algorithm may have faster convergence to the optimal solution, 
it can be used to obtain an initial value for the stochastic learning, whose convergence was established 
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Figure 8: Convergence of learning algorithm, (a) the threshold policy adopted by users is piecewise constant; 
(b) g(B) when T ± T; (c) g(B) when T = f 

in Proposition I6.2I A comparison of pros and cons of the modified binary search and stochastic learning is 
provided in |43j . 



D.3 Simulation 

In order to 1) study the behavior of our algorithm when the population size varies and to 2) investigate how 
the convergence speed may be affected by the correlation among users, we conducted trace-driven simulations, 



whose results are shown in http : //www-net . cs .umass . edu/~sadoc/ agecontrol/bus-ap-ct20/index .html 



Simulation setup We set the parameters of our learning algorithm as follows: M=30, G=0.4, T=ll, 
_P=100 and t — 10 time slots. The number of users is initially 105 and drops to 90 at round 100. Finally, 
a=10 for trace driven simulations, and a — 20 under the uniformity and independence assumption. 

In our trace drive simulations in this section, we consider half of the population in one bus and the other 
half in another. Recall that for each bus shift we generate a string of zeros and ones corresponding to slots with 
and without a useful contact opportunity, respectively. In order to distinguish the users in two buses, while 
simulating the opportunities observed by each user we assume that half of them take their first observations in 
the beginning of the string of zeros and ones while the other half take their first observations in the middle of 
the trace. After taking their first observations, the subsequent ones are drawn in sequence out of the string of 



zeros and ones (our simulator is available at http : //www-net . cs . umass . edu/~sadoc/ agecontrol/learning . tgz 



Despite the correlations among users, for the 88 bus shifts analyzed our trace driven results did not signifi- 



cantly deviate from the ones obtained with uncorrelated users. In http : //www-net . cs . umass . edu/~sadoc/ agecontrol/bus- 

we show for each bus shift the results obtained with trace driven simulations as well as under the uniformity 
and independence assumption (the contact opportunity being obtained from traces). In what follows, we 
illustrate our results with two examples. 



Simulation Results Figures [5] and [TT] illustrate our simulation results for two bus shifts. Figures [Ha) 
and mTa) show the contact opportunities. Note that while in Figure [5]Ja) the opportunities are roughly 
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uniform across different time slots, in Figure lllt a) there is a high concentration of opportunities close to 
Haigis Mall, but not many in-between arrivals at Haigis Mall. This, in turn, impacts the results obtained 
with the learning algorithms, as shown in Figures [^b)-(c) and HUb)-(c). The dotted lines represent the 
optimal range of bonuses. While in Figure [SJb) the bonus computed using the learning algorithm converges 
to optimal values, in Figure lllf b) the difference between the optimal bonus and the one obtained with the 
learning algorithm was around 20. Despite such difference, though, note that the number of transmissions 
experienced by the service providers oscillated around its target, eleven, in both cases (see Figures Etc) and 
Figures flTTc)). 

In Figure [HIc) the number of transmissions remains stable during certain intervals of time (for instance, 
in the interval [20,30]). This is partly due to the synchronization of the users, that are assumed to begin 
with the same initial state and to be experience correlated contact opportunities. In contrast, if we consider 
users that experience contact opportunities uniformly at random, such synchronizations do not occur and 
the bonus computed using the learning algorithm converges to optimal values in both Figure [9] and [TT] (see 
Figures H[d)-(e) and Figures [TlTd)-(e)). 

Further insight into the synchronization among users in Figures H{c) is obtained from Figure [TUJ Each 
curve in Figure [TU] shows the age of the users in each of the buses. Since all the users are assumed to begin 
with age 1 at slot 1, the age of all users in a bus remains equal throughout our simulations. Figure [TOl shows 
that from slots 300 to 400 (which correspond to rounds 30 to 40), in each round each user issues one update. 
Since there are 105 users, the average number of updates per slot is 10.5. Note that such synchronization 
does not occur between slots 400 and 450, when the number of updates issued by a user in a given slot varies 
between 1 and 2. The synchronization between users is an artifact due to the assumption that they have the 
same initial state at slot 1, and does not occur if users have their initial states sampled uniformly at random. 
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Figure 9: (a) Contact opportunities; (b) and (c) show the bonus and number of transmissions obtained from 
trace driven simulations; (d) and (e) show the bonus and number of transmissions obtained from simulations 
under uniformity and independence assumptions. 
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Figure 11: (a) Contact opportunities; (b) and (c) show the bonus and number of transmissions obtained from 
trace driven simulations; (d) and (e) show the bonus and number of transmissions obtained from simulations 
under uniformity and independence assumptions. 
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Figure 12: Trace-driven reward as a function of the threshold, where threshold is assumed the same at all 
bus shifts (Figure |4|c) with 95% confidence intervals) 
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